Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.007
Filtrar
1.
PLoS One ; 19(3): e0300926, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38551907

RESUMO

To examine visual speech perception (i.e., lip-reading), we created a multi-layer network (the AV-net) that contained: (1) an auditory layer with nodes representing phonological word-forms and edges connecting words that were phonologically related, and (2) a visual layer with nodes representing the viseme representations of words and edges connecting viseme representations that differed by a single viseme (and additional edges to connect related nodes in the two layers). The results of several computer simulations (in which activation diffused across the network to simulate word identification) are reported and compared to the performance of human participants who identified the same words in a condition in which audio and visual information were both presented (Simulation 1), in an audio-only presentation condition (Simulation 2), and a visual-only presentation condition (Simulation 3). Another simulation (Simulation 4) examined the influence of phonological information on visual speech perception by comparing performance in the multi-layer AV-net to a single-layer network that contained only a visual layer with nodes representing the viseme representations of words and edges connecting viseme representations that differed by a single viseme. We also report the results of several analyses of the errors made by human participants in the visual-only presentation condition. The results of our analyses have implications for future research and training of lip-reading, and for the development of automatic lip-reading devices and software for individuals with certain developmental or acquired disorders or for listeners with normal hearing in noisy conditions.


Assuntos
Percepção da Fala , Humanos , Percepção da Fala/fisiologia , Percepção Visual/fisiologia , Leitura Labial , Fala , Linguística
2.
Ear Hear ; 45(1): 164-173, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-37491715

RESUMO

OBJECTIVES: Speech perception training can be a highly effective intervention to improve perception and language abilities in children who are deaf or hard of hearing. Most studies of speech perception training, however, only measure gains immediately following training. Only a minority of cases include a follow-up assessment after a period without training. A critical unanswered question was whether training-related benefits are retained for a period of time after training has stopped. A primary goal of this investigation was to determine whether children retained training-related benefits 4 to 6 weeks after they completed 16 hours of formal speech perception training. Training was comprised of either auditory or speechreading training, or a combination of both. Also important is to determine if "booster" training can help increase gains made during the initial intensive training period. Another goal of the study was to investigate the benefits of providing home-based booster training during the 4- to 6-week interval after the formal training ceased. The original investigation ( Tye-Murray et al. 2022 ) compared the effects of talker familiarity and the relative benefits of the different types of training. We predicted that the children who received no additional training would retain the gains after the completing the formal training. We also predicted that those children who completed the booster training would realize additional gains. DESIGN: Children, 6 to 12 years old, with hearing loss who had previously participated in the original randomized control study returned 4 to 6 weeks after the conclusion to take a follow-up speech perception assessment. The first group (n = 44) returned after receiving no formal intervention from the research team before the follow-up assessment. A second group of 40 children completed an additional 16 hours of speech perception training at home during a 4- to 6-week interval before the follow-up speech perception assessment. The home-based speech perception training was a continuation of the same training that was received in the laboratory formatted to work on a PC tablet with a portable speaker. The follow-up speech perception assessment included measures of listening and speechreading, with test items spoken by both familiar (trained) and unfamiliar (untrained) talkers. RESULTS: In the group that did not receive the booster training, follow-up testing showed retention for all gains that were obtained immediately following the laboratory-based training. The group that received booster training during the same interval also maintained the benefits from the formal training, with some indication of minor improvement. CONCLUSIONS: Clinically, the present findings are extremely encouraging; the group that did not receive home-based booster training retained the benefits obtained during the laboratory-based training regimen. Moreover, the results suggest that self-paced booster training maintained the relative training gains associated with talker familiarity and training type seen immediately following laboratory-based training. Future aural rehabilitation programs should include maintenance training at home to supplement the speech perception training conducted under more formal conditions at school or in the clinic.


Assuntos
Correção de Deficiência Auditiva , Surdez , Perda Auditiva , Percepção da Fala , Criança , Humanos , Perda Auditiva/reabilitação , Leitura Labial , Correção de Deficiência Auditiva/métodos
3.
J Speech Lang Hear Res ; 66(12): 5109-5128, 2023 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-37934877

RESUMO

PURPOSE: The COVID-19 pandemic led to the implementation of preventive measures that exacerbated communication difficulties for individuals with hearing loss. This study aims to explore the perception of adults with hearing loss about the communication difficulties caused by the preventive measures and about their experiences with communication 1 year after the adoption of these preventive measures. METHOD: Individual semistructured interviews were conducted via videoconference with six adults who have hearing loss from the province of Québec, Canada. Data were examined using qualitative content analysis. RESULTS: The study found that face masks and in-person work (i.e., in opposition to remote work) were important barriers to communication because of hindered lipreading and competing noise in many workplaces. In contrast, preventive measures that allowed visual information transmission (e.g., transparent face masks, fixed plastic partitions) were considered favorable for communication. Communication partners were perceived as playing an important role in communication success with preventive measures: Familiar communication partners improved communication, whereas those with poor attitude or strategies hindered communication. Participants found that videoconferences could provide satisfactory communication but were sometimes hindered by issues such as bad audiovisual quality or too many participants. CONCLUSIONS: This study identified reduced access to speech reading and lack of general awareness about hearing issues as key barriers to communication during the pandemic. The decreased communication capabilities were perceived to be most problematic at work and during health appointments, and tended to cause frustration, anxiety, self-esteem issues, and social isolation. Suggestions are outlined for current and future public health measures to better consider the experience of people with hearing loss.


Assuntos
COVID-19 , Surdez , Perda Auditiva , Adulto , Humanos , Pandemias , COVID-19/prevenção & controle , Leitura Labial
4.
Neuroimage ; 282: 120391, 2023 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-37757989

RESUMO

There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.


Assuntos
Córtex Auditivo , Percepção da Fala , Humanos , Leitura Labial , Percepção da Fala/fisiologia , Encéfalo/fisiologia , Córtex Auditivo/fisiologia , Fonética , Percepção Visual/fisiologia
5.
Sensors (Basel) ; 23(4)2023 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-36850648

RESUMO

The current accuracy of speech recognition can reach over 97% on different datasets, but in noisy environments, it is greatly reduced. Improving speech recognition performance in noisy environments is a challenging task. Due to the fact that visual information is not affected by noise, researchers often use lip information to help to improve speech recognition performance. This is where the performance of lip recognition and the effect of cross-modal fusion are particularly important. In this paper, we try to improve the accuracy of speech recognition in noisy environments by improving the lip reading performance and the cross-modal fusion effect. First, due to the same lip possibly containing multiple meanings, we constructed a one-to-many mapping relationship model between lips and speech allowing for the lip reading model to consider which articulations are represented from the input lip movements. Audio representations are also preserved by modeling the inter-relationships between paired audiovisual representations. At the inference stage, the preserved audio representations could be extracted from memory by the learned inter-relationships using only video input. Second, a joint cross-fusion model using the attention mechanism could effectively exploit complementary intermodal relationships, and the model calculates cross-attention weights on the basis of the correlations between joint feature representations and individual modalities. Lastly, our proposed model achieved a 4.0% reduction in WER in a -15 dB SNR environment compared to the baseline method, and a 10.1% reduction in WER compared to speech recognition. The experimental results show that our method could achieve a significant improvement over speech recognition models in different noise environments.


Assuntos
Leitura Labial , Percepção da Fala , Humanos , Fala , Aprendizagem , Lábio
6.
Sci Rep ; 13(1): 928, 2023 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-36650188

RESUMO

In this work, we propose a framework to enhance the communication abilities of speech-impaired patients in an intensive care setting via reading lips. Medical procedure, such as a tracheotomy, causes the patient to lose the ability to utter speech with little to no impact on the habitual lip movement. Consequently, we developed a framework to predict the silently spoken text by performing visual speech recognition, i.e., lip-reading. In a two-stage architecture, frames of the patient's face are used to infer audio features as an intermediate prediction target, which are then used to predict the uttered text. To the best of our knowledge, this is the first approach to bring visual speech recognition into an intensive care setting. For this purpose, we recorded an audio-visual dataset in the University Hospital of Aachen's intensive care unit (ICU) with a language corpus hand-picked by experienced clinicians to be representative of their day-to-day routine. With a word error rate of 6.3%, the trained system reaches a sufficient overall performance to significantly increase the quality of communication between patient and clinician or relatives.


Assuntos
Percepção da Fala , Humanos , Fala , Leitura Labial , Idioma , Cuidados Críticos
7.
J Psychosoc Nurs Ment Health Serv ; 61(4): 18-26, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36198121

RESUMO

The goal of the current interpretive phenomenological study grounded in Heidegger's philosophies was to explore the experience of lipreaders when society was masked during the coronavirus disease 2019 pandemic. Participants were prelingually deafened English-speaking adults who predominantly relied on lip-reading and speaking for communication. Twelve in-depth email interviews were conducted with respondents recruited via social media. Thematic techniques of Benner were employed, and six themes emerged: Limiting of World Resulting in Negative Emotions, Increased Prominence of Deafness, Balancing Safety and Communication Access, Creative Resourcefulness, Resilience and Personal Growth, and Passage of Time to Bittersweet Freedom. Insights from this study clarify the need for psychosocial support of lipreaders during times of restricted communication access and awareness of accommodations to facilitate inclusion. [Journal of Psychosocial Nursing and Mental Health Services, 61(4), 18-26.].


Assuntos
COVID-19 , Leitura Labial , Máscaras , Adulto , Humanos
8.
Brain Behav ; 13(2): e2869, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36579557

RESUMO

INTRODUCTION: Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS: We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS: Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS: Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.


Assuntos
Leitura Labial , Percepção da Fala , Humanos , Feminino , Encéfalo , Percepção Auditiva , Cognição , Imageamento por Ressonância Magnética
9.
J Child Lang ; 50(1): 27-51, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36503546

RESUMO

This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.


Assuntos
Leitura Labial , Percepção da Fala , Adulto , Criança , Humanos , Pré-Escolar , Fala , Desenvolvimento da Linguagem , Sinais (Psicologia)
10.
Am Ann Deaf ; 167(3): 303-312, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36314163

RESUMO

Perceptual restoration occurs when the brain restores missing segments from speech under certain conditions. It is investigated in the auditory modality, but minimal evidence has been collected during speechreading tasks. The authors measured perceptual restoration in speechreading by individuals with hearing loss and compared it to perceptual restoration in auditory speech by normally hearing individuals. Visual perceptual restoration for speechreading was measured in 33 individuals with profound hearing loss by blurring the keywords in silent video recordings of a speaker uttering a sentence. Auditory perceptual restoration was measured in 33 normally hearing individuals by distorting the keywords in spoken sentences. It was found that the amount of restoration was similar for speechreading through the visual modality by individuals with hearing loss and speech perception through the auditory modality by normally hearing individuals. These findings may facilitate understanding of speech processing by individuals with hearing loss.


Assuntos
Surdez , Perda Auditiva , Percepção da Fala , Adulto , Humanos , Leitura Labial , Audição
11.
Nat Commun ; 13(1): 5168, 2022 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-36071056

RESUMO

The problem of Lip-reading has become an important research challenge in recent years. The goal is to recognise speech from lip movements. Most of the Lip-reading technologies developed so far are camera-based, which require video recording of the target. However, these technologies have well-known limitations of occlusion and ambient lighting with serious privacy concerns. Furthermore, vision-based technologies are not useful for multi-modal hearing aids in the coronavirus (COVID-19) environment, where face masks have become a norm. This paper aims to solve the fundamental limitations of camera-based systems by proposing a radio frequency (RF) based Lip-reading framework, having an ability to read lips under face masks. The framework employs Wi-Fi and radar technologies as enablers of RF sensing based Lip-reading. A dataset comprising of vowels A, E, I, O, U and empty (static/closed lips) is collected using both technologies, with a face mask. The collected data is used to train machine learning (ML) and deep learning (DL) models. A high classification accuracy of 95% is achieved on the Wi-Fi data utilising neural network (NN) models. Moreover, similar accuracy is achieved by VGG16 deep learning model on the collected radar-based dataset.


Assuntos
COVID-19 , Máscaras , COVID-19/prevenção & controle , Humanos , Leitura Labial , Redes Neurais de Computação , Equipamento de Proteção Individual
12.
PLoS One ; 17(9): e0275585, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36178907

RESUMO

Visual input is crucial for understanding speech under noisy conditions, but there are hardly any tools to assess the individual ability to lipread. With this study, we wanted to (1) investigate how linguistic characteristics of language on the one hand and hearing impairment on the other hand have an impact on lipreading abilities and (2) provide a tool to assess lipreading abilities for German speakers. 170 participants (22 prelingually deaf) completed the online assessment, which consisted of a subjective hearing impairment scale and silent videos in which different item categories (numbers, words, and sentences) were spoken. The task for our participants was to recognize the spoken stimuli just by visual inspection. We used different versions of one test and investigated the impact of item categories, word frequency in the spoken language, articulation, sentence frequency in the spoken language, sentence length, and differences between speakers on the recognition score. We found an effect of item categories, articulation, sentence frequency, and sentence length on the recognition score. With respect to hearing impairment we found that higher subjective hearing impairment is associated with higher test score. We did not find any evidence that prelingually deaf individuals show enhanced lipreading skills over people with postlingual acquired hearing impairment. However, we see an interaction with education only in the prelingual deaf, but not in the population with postlingual acquired hearing loss. This points to the fact that there are different factors contributing to enhanced lipreading abilities depending on the onset of hearing impairment (prelingual vs. postlingual). Overall, lipreading skills vary strongly in the general population independent of hearing impairment. Based on our findings we constructed a new and efficient lipreading assessment tool (SaLT) that can be used to test behavioral lipreading abilities in the German speaking population.


Assuntos
Surdez , Perda Auditiva , Percepção da Fala , Humanos , Idioma , Linguística , Leitura Labial , Fala , Percepção Visual
13.
J Neurosci ; 42(31): 6108-6120, 2022 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-35760528

RESUMO

Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase-locking to auditory and visual signals in MEG recordings from 14 human participants (6 females, 8 males) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual, or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6 Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase-locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared with audio-only speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus did not show above-chance partial coherence with visual speech signals during AV conditions but did show partial coherence in visual-only conditions. Hence, visual speech enabled stronger phase-locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.SIGNIFICANCE STATEMENT Verbal communication in noisy environments is challenging, especially for hearing-impaired individuals. Seeing facial movements of communication partners improves speech perception when auditory signals are degraded or absent. The neural mechanisms supporting lip-reading or audio-visual benefit are not fully understood. Using MEG recordings and partial coherence analysis, we show that speech information is used differently in brain regions that respond to auditory and visual speech. While visual areas use visual speech to improve phase-locking to auditory speech signals, auditory areas do not show phase-locking to visual speech unless auditory speech is absent and visual speech is used to substitute for missing auditory signals. These findings highlight brain processes that combine visual and auditory signals to support speech understanding.


Assuntos
Córtex Auditivo , Percepção da Fala , Córtex Visual , Estimulação Acústica , Córtex Auditivo/fisiologia , Percepção Auditiva , Feminino , Humanos , Leitura Labial , Masculino , Fala/fisiologia , Percepção da Fala/fisiologia , Córtex Visual/fisiologia , Percepção Visual/fisiologia
14.
eNeuro ; 9(3)2022.
Artigo em Inglês | MEDLINE | ID: mdl-35728955

RESUMO

Speech is an intrinsically multisensory signal, and seeing the speaker's lips forms a cornerstone of communication in acoustically impoverished environments. Still, it remains unclear how the brain exploits visual speech for comprehension. Previous work debated whether lip signals are mainly processed along the auditory pathways or whether the visual system directly implements speech-related processes. To probe this, we systematically characterized dynamic representations of multiple acoustic and visual speech-derived features in source localized MEG recordings that were obtained while participants listened to speech or viewed silent speech. Using a mutual-information framework we provide a comprehensive assessment of how well temporal and occipital cortices reflect the physically presented signals and unique aspects of acoustic features that were physically absent but may be critical for comprehension. Our results demonstrate that both cortices feature a functionally specific form of multisensory restoration: during lip reading, they reflect unheard acoustic features, independent of co-existing representations of the visible lip movements. This restoration emphasizes the unheard pitch signature in occipital cortex and the speech envelope in temporal cortex and is predictive of lip-reading performance. These findings suggest that when seeing the speaker's lips, the brain engages both visual and auditory pathways to support comprehension by exploiting multisensory correspondences between lip movements and spectro-temporal acoustic cues.


Assuntos
Leitura Labial , Percepção da Fala , Estimulação Acústica , Acústica , Humanos , Fala
15.
Sensors (Basel) ; 22(10)2022 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-35632141

RESUMO

Lipreading is a technique for analyzing sequences of lip movements and then recognizing the speech content of a speaker. Limited by the structure of our vocal organs, the number of pronunciations we could make is finite, leading to problems with homophones when speaking. On the other hand, different speakers will have various lip movements for the same word. For these problems, we focused on the spatial-temporal feature extraction in word-level lipreading in this paper, and an efficient two-stream model was proposed to learn the relative dynamic information of lip motion. In this model, two different channel capacity CNN streams are used to extract static features in a single frame and dynamic information between multi-frame sequences, respectively. We explored a more effective convolution structure for each component in the front-end model and improved by about 8%. Then, according to the characteristics of the word-level lipreading dataset, we further studied the impact of the two sampling methods on the fast and slow channels. Furthermore, we discussed the influence of the fusion methods of the front-end and back-end models under the two-stream network structure. Finally, we evaluated the proposed model on two large-scale lipreading datasets and achieved a new state-of-the-art.


Assuntos
Algoritmos , Leitura Labial , Humanos , Aprendizagem , Movimento (Física) , Movimento
16.
Sensors (Basel) ; 22(9)2022 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-35591284

RESUMO

Concomitant with the recent advances in deep learning, automatic speech recognition and visual speech recognition (VSR) have received considerable attention. However, although VSR systems must identify speech from both frontal and profile faces in real-world scenarios, most VSR studies have focused solely on frontal face pictures. To address this issue, we propose an end-to-end sentence-level multi-view VSR architecture for faces captured from four different perspectives (frontal, 30°, 45°, and 60°). The encoder uses multiple convolutional neural networks with a spatial attention module to detect minor changes in the mouth patterns of similarly pronounced words, and the decoder uses cascaded local self-attention connectionist temporal classification to collect the details of local contextual information in the immediate vicinity, which results in a substantial performance boost and speedy convergence. To compare the performance of the proposed model for experiments on the OuluVS2 dataset, the dataset was divided into four different perspectives, and the obtained performance improvement was 3.31% (0°), 4.79% (30°), 5.51% (45°), 6.18% (60°), and 4.95% (mean), respectively, compared with the existing state-of-the-art performance, and the average performance improved by 9.1% compared with the baseline. Thus, the suggested design enhances the performance of multi-view VSR and boosts its usefulness in real-world applications.


Assuntos
Leitura Labial , Redes Neurais de Computação , Atenção , Humanos , Idioma , Fala
17.
Am J Audiol ; 31(2): 453-469, 2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35316072

RESUMO

PURPOSE: The goal of this review article is to reinvigorate interest in lipreading and lipreading training for adults with acquired hearing loss. Most adults benefit from being able to see the talker when speech is degraded; however, the effect size is related to their lipreading ability, which is typically poor in adults who have experienced normal hearing through most of their lives. Lipreading training has been viewed as a possible avenue for rehabilitation of adults with an acquired hearing loss, but most training approaches have not been particularly successful. Here, we describe lipreading and theoretically motivated approaches to its training, as well as examples of successful training paradigms. We discuss some extensions to auditory-only (AO) and audiovisual (AV) speech recognition. METHOD: Visual speech perception and word recognition are described. Traditional and contemporary views of training and perceptual learning are outlined. We focus on the roles of external and internal feedback and the training task in perceptual learning, and we describe results of lipreading training experiments. RESULTS: Lipreading is commonly characterized as limited to viseme perception. However, evidence demonstrates subvisemic perception of visual phonetic information. Lipreading words also relies on lexical constraints, not unlike auditory spoken word recognition. Lipreading has been shown to be difficult to improve through training, but under specific feedback and task conditions, training can be successful, and learning can generalize to untrained materials, including AV sentence stimuli in noise. The results on lipreading have implications for AO and AV training and for use of acoustically processed speech in face-to-face communication. CONCLUSION: Given its importance for speech recognition with a hearing loss, we suggest that the research and clinical communities integrate lipreading in their efforts to improve speech recognition in adults with acquired hearing loss.


Assuntos
Surdez , Perda Auditiva , Percepção da Fala , Adulto , Humanos , Leitura Labial , Fala
18.
HNO ; 70(6): 456-465, 2022 Jun.
Artigo em Alemão | MEDLINE | ID: mdl-35024877

RESUMO

BACKGROUND: When reading lips, many people benefit from additional visual information from the lip movements of the speaker, which is, however, very error prone. Algorithms for lip reading with artificial intelligence based on artificial neural networks significantly improve word recognition but are not available for the German language. MATERIALS AND METHODS: A total of 1806 videoclips with only one German-speaking person each were selected, split into word segments, and assigned to word classes using speech-recognition software. In 38,391 video segments with 32 speakers, 18 polysyllabic, visually distinguishable words were used to train and validate a neural network. The 3D Convolutional Neural Network and Gated Recurrent Units models and a combination of both models (GRUConv) were compared, as were different image sections and color spaces of the videos. The accuracy was determined in 5000 training epochs. RESULTS: Comparison of the color spaces did not reveal any relevant different correct classification rates in the range from 69% to 72%. With a cut to the lips, a significantly higher accuracy of 70% was achieved than when cut to the entire speaker's face (34%). With the GRUConv model, the maximum accuracies were 87% with known speakers and 63% in the validation with unknown speakers. CONCLUSION: The neural network for lip reading, which was first developed for the German language, shows a very high level of accuracy, comparable to English-language algorithms. It works with unknown speakers as well and can be generalized with more word classes.


Assuntos
Aprendizado Profundo , Idioma , Algoritmos , Inteligência Artificial , Humanos , Leitura Labial
19.
J Neurosci ; 42(3): 435-442, 2022 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-34815317

RESUMO

In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here, we used fMRI to monitor brain activity while adult humans (n = 60) were presented with visual-only, auditory-only, and audiovisual words. The audiovisual words were presented in quiet and in several signal-to-noise ratios. As expected, audiovisual speech perception recruited both auditory and visual cortex, with some evidence for increased recruitment of premotor cortex in some conditions (including in substantial background noise). We then investigated neural connectivity using psychophysiological interaction analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. In addition to whole-brain analyses, we also conducted a region-of-interest analysis on the left posterior superior temporal sulcus (pSTS), implicated in many previous studies of audiovisual speech perception. We found evidence for both activity and effective connectivity in pSTS for visual-only and audiovisual speech, although these were not significant in whole-brain analyses. Together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech that complements activity in integrative brain regions like pSTS.SIGNIFICANCE STATEMENT In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is hard to understand (e.g., background noise). Prior work has suggested that specialized regions of the brain may play a critical role in integrating information from visual and auditory speech. Here, we show a complementary mechanism relying on synchronized brain activity among sensory and motor regions may also play a critical role. These findings encourage reconceptualizing audiovisual integration in the context of coordinated network activity.


Assuntos
Córtex Auditivo/fisiologia , Idioma , Leitura Labial , Rede Nervosa/fisiologia , Percepção da Fala/fisiologia , Córtex Visual/fisiologia , Percepção Visual/fisiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Córtex Auditivo/diagnóstico por imagem , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Rede Nervosa/diagnóstico por imagem , Córtex Visual/diagnóstico por imagem , Adulto Jovem
20.
Psychon Bull Rev ; 29(2): 600-612, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34671936

RESUMO

Human face-to-face communication is multimodal: it comprises speech as well as visual cues, such as articulatory and limb gestures. In the current study, we assess how iconic gestures and mouth movements influence audiovisual word recognition. We presented video clips of an actress uttering single words accompanied, or not, by more or less informative iconic gestures. For each word we also measured the informativeness of the mouth movements from a separate lipreading task. We manipulated whether gestures were congruent or incongruent with the speech, and whether the words were audible or noise vocoded. The task was to decide whether the speech from the video matched a previously seen picture. We found that congruent iconic gestures aided word recognition, especially in the noise-vocoded condition, and the effect was larger (in terms of reaction times) for more informative gestures. Moreover, more informative mouth movements facilitated performance in challenging listening conditions when the speech was accompanied by gestures (either congruent or incongruent) suggesting an enhancement when both cues are present relative to just one. We also observed (a trend) that more informative mouth movements speeded up word recognition across clarity conditions, but only when the gestures were absent. We conclude that listeners use and dynamically weight the informativeness of gestures and mouth movements available during face-to-face communication.


Assuntos
Gestos , Percepção da Fala , Compreensão , Humanos , Leitura Labial , Fala
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...